18 research outputs found
What Can Help Pedestrian Detection?
Aggregating extra features has been considered as an effective approach to
boost traditional pedestrian detection methods. However, there is still a lack
of studies on whether and how CNN-based pedestrian detectors can benefit from
these extra features. The first contribution of this paper is exploring this
issue by aggregating extra features into CNN-based pedestrian detection
framework. Through extensive experiments, we evaluate the effects of different
kinds of extra features quantitatively. Moreover, we propose a novel network
architecture, namely HyperLearner, to jointly learn pedestrian detection as
well as the given extra feature. By multi-task training, HyperLearner is able
to utilize the information of given features and improve detection performance
without extra inputs in inference. The experimental results on multiple
pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
Repulsion Loss: Detecting Pedestrians in a Crowd
Detecting individual pedestrians in a crowd remains a challenging problem
since the pedestrians often gather together and occlude each other in
real-world scenarios. In this paper, we first explore how a state-of-the-art
pedestrian detector is harmed by crowd occlusion via experimentation, providing
insights into the crowd occlusion problem. Then, we propose a novel bounding
box regression loss specifically designed for crowd scenes, termed repulsion
loss. This loss is driven by two motivations: the attraction by target, and the
repulsion by other surrounding objects. The repulsion term prevents the
proposal from shifting to surrounding objects thus leading to more crowd-robust
localization. Our detector trained by repulsion loss outperforms all the
state-of-the-art methods with a significant improvement in occlusion cases.Comment: Accepted to IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
MegDet: A Large Mini-Batch Object Detector
The improvements in recent CNN-based object detection works, from R-CNN [11],
Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly
come from new network, new framework, or novel loss design. But mini-batch
size, a key factor in the training, has not been well studied. In this paper,
we propose a Large MiniBatch Object Detector (MegDet) to enable the training
with much larger mini-batch size than before (e.g. from 16 to 256), so that we
can effectively utilize multiple GPUs (up to 128 in our experiments) to
significantly shorten the training time. Technically, we suggest a learning
rate policy and Cross-GPU Batch Normalization, which together allow us to
successfully train a large mini-batch detector in much less time (e.g., from 33
hours to 4 hours), and achieve even better accuracy. The MegDet is the backbone
of our submission (mmAP 52.5%) to COCO 2017 Challenge, where we won the 1st
place of Detection task